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1.  Introduction 


The  objective  of  this  program  is  to  realize  dense  modifiable  iuterconnectioirs  in  systems 
such  as  neural  networks  using  photorefractive  volume  holograms.  In  this  report  we  describe 
an  experimental  two-layer  optical  neural  network  built  at  Caltech,  using  photorefractive 
approach,  for  handwritten  character-alphabet  (A-Z)  recognition. 

Twenty  years  ago,  Minsky  and  Papert  proved  that  a  single-layer  perceptron  cannot 
solve  problems  outside  a  very  narrow  class,  wliich  put  an  end  to  the  early  efforts  in  neural 
network  research.^  The  recent  resurgence  in  this  field  was  initiated  partly  by  the  discovery 
of  more  complex  artificial  neural  net  architectures  such  as  multilayer  networks  and  related 
learning  algorithms.^  The  use  of  multilayer  networks  was  further  justified  by  Hornik  et 
al,  who  showed  that  standard  multilayer  feedforward  networks  with  as  few  as  one  hidden 
layer  using  arbitrary  squashing  functions  are  capable  of  approximating  any  Borel  measur¬ 
able  function  to  any  desired  degree  of  accuracy,  provided  sufficiently  many  hidden  units 
are  available.^  Optics  is  particularly  suited  and  highly  desirable  for  implementation  of 
feedforward  multilayer  neural  networks  because  of  the  high  parallelism  that  optics  provide 
and  the  similarity'  between  feedforward  structures  and  cleissical  optical  correlators."*’®  Most 
important  is  the  maturing  of  several  critical  technologies,  such  as  spatial  light  modulators 
with  light  amplification  and  nonlinear  thresholding  capabilities®,  and  dynamic  photorefrac¬ 
tive  volume  holograms^,  that  are  necessary  for  realization  of  multilayer  learning  networks. 
In  this  report  we  describe  an  experiment  in  which  such  devices  are  used  to  implement  a 
multilay'er  network. 

The  system  that  we  built  is  a  two-layer  network  and  it  was  trained  based  on  Kanerva’s 
model  of  Sparse,  Distributed  Memory  (SDM)®.  Kanerva’s  learning  model  was  chosen  be¬ 
cause  it  is  relatively  easy  to  implement  compared  with  other  learning  algorithms.  The 
system  uses  photorefractive  holograms  to  implement  sy'naptic  interconnections  and  liq¬ 
uid  crystal  light  valves  (LCLVs)  to  perform  nonlinear  thresholding  operations.  The  first 
layer  has  fixed,  random  weights  of  interconnections,  which  map  each  input  pattern  into  a 
very  large  sparse,  distributed  internal  representation.  The  second  layer  is  trained  by  the 
sum-of- outer-products  rule,  which  associates  internal  representations  of  different  classes  of 
characters  to  different  responses  of  output  neurons.  It  I.’  shov.'ii  that  the  tiained  network 
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can  recognize  not  only  all  the  training  patterns  but  also  a  fairly  large  percentage  of  test 
patterns  that  it  has  never  seen. 

2.  Sparse,  Distributed  Memory  Model 

In  this  section  we  briefly  review  Kanerva’s  Sparse,  Distributed  Memory  (SDM)^ 
to  point  out  the  necessary  characteristics  that  the  optical  system  must  incorporate.  A 
schematic  representation  of  a  two-layer  network  is  shown  in  Fig.  1,  which  consists  of  an 
input  layer  globally  interconnected  to  a  hidden  layer,  which  is  interconected  through  a  sec¬ 
ond  weighted  network  to  an  output  layer.  The  system  is  trained  so  that  the  desired  outputs 
are  produced  for  the  respective  input  patterns  Moreover, 

an  output  Y  of  the  network  is  close  to  when  the  system  is  presented  with  the  input  -Y 
close  to  and  Y’^-'^  are  real  vectors  of  length  m  and  n,  respectively,  with  compo¬ 

nents  restricted  to  the  binary  set  B  =  {  — 1,-1-1}.  In  general,  the  interconnection  weights 
of  both  layers  are  modifiable,  so  that  the  system  can  be  trained  to  perform  a  desired  pat¬ 
tern  transformation  from  the  input  space  to  the  output  space.  In  SDM,  however,  the  first 
layer  acts  as  a  fixed-weight  preprocessor  encoding  each  n-bit  input  into  a  very  large  s-bit 
internal  representation,  s  n.  The  second  layer  is  a  trainable  sum-of-outer-products  net¬ 
work,  which  is  programmed  to  recognize  the  higher-dimensional  internal  representations. 
Kanerva’s  primary  contribution  is  the  specification  of  the  preprocessor,  that  is,  how  to 
map  each  n-bit  input  into  a  very  large  s-bit  internal  representation  in  such  a  way  as  to 
permit  the  capacity  to  exceed  by  far  any  linear  relationship  with  the  input  dimension.  This 
is  important  because  in  most  applications,  the  dimension  of  the  input  (which  is  approxi¬ 
mately  equal  to  the  capacity  of  a  single  layer  machine)  is  much  smaller  than  the  number 
of  patterns  we  wish  to  recognize. 

Consider  the  s-bit  internal  representation  to  be  a  binary  vector  embeded  in  1?®,  with 
components  restricted  to  zero  and  one,  and  let  fg:R^—y  be  the  function  which  applies 
the  unit  step  function  (translated  by  6)  to  each  coordinate  independently.  That  is,  the  ith 
coordinate  of  fe{U)  is  1  if  17;  >  ^  and  0  if  Ui  <  6. 

The  operation  of  the  firs',  Inver  r^y,  riovv  be  ensily  desrribed.  The  o  X  n  weight  iiiHi  i  iA 
Z  is  populated  at  random  by  -fl’s  and  -I’s.  The  input  vector  to  the  hidden  neurons  is 
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given  by  the  matrix- vector  product  ZX,  which  is  thresholded  by  the  function  fg  to  become 
the  output  vector  H  =  /^{ZX)  of  the  hidden  neurons.  With  6  =  n  —  2r,  the  s-bit  word  H 
contains  a  one  in  the  ?'th  coordinate  if  and  only  if  X  is  within  Hamming  distance  r  of  the 
ith  row  of  Z.  If  the  parameters  r  and  s  are  set  correctly,  then  the  number  of  I’s  in  the 
representation  H  will  be  very  small  in  comparison  to  the  number  of  O’s.  Hence  H  can  be 
considered  to  be  a  sparse,  distributed  representation  of  X :  sparse  because  there  are  fe-v 
I’s,  distributed  because  several  I’s  share  in  the  representation  of  X. 

The  overall  SDM  can  be  regarded  as  a  sum-of-outer-products  associative  memory 
operating  on  the  sparse,  distributed  representation  of  X.  Let  g  :  be  the  vector 

signum  function,  which  takes  the  sign  of  each  coordinate  independently.  Then  the  response 
of  the  output  neuron  is  F  =  g(WH),  where  the  synaptic  weight  matrix  W  for  the  second 
layer  is  given  by 

M 

W  =  (1) 

It  has  been  shown^®  that  by  allowing  s,  the  dimension  of  hidden  layer,  to  grow  ex¬ 
ponentially  with  the  input  dimension  n,  the  capacity  of  the  SDM  can  grow  exponentially 
in  n,  achieving  the  universal  upper  bound  of  any  associati”e  memory.  This  is  in  sharp 
contrast  to  the  capacity  of  a  single  layer  associative  memory,  which  grows  at  most  linearly 
with  the  input  dimension.  In  terms  of  pattern  recognition,  large  s  implies  mapping  input 
vectors  into  a  higher  dimensional  space  so  that  it  is  much  easier  to  find  the  appropriate 
decision  boundaries.  In  this  way,  a  linearly  unseparable  problem  can  be  converted  into  a 
linearly  separable  one  at  the  hidden  layer^  ^ . 

3.  Optical  Implementation 

The  optical  implementation  of  a  two-layer  neural  network  trained  by  SDM  requires 
both  fixed  and  modifiable  interconnection  matrices.  Dynamic  volume  holograms  are  very 
promising  candidates  for  the  implementation  of  such  interconnection  matrices  because  of 
the  three  dimensional  storage  rapacity  possible  within  the  volume  of  a  crystal,  the  well- 
studied  dynamic  response  of  photorefractive  crystals  and  the  ability  to  fix  photorefractivc 
holograms.  Neuliucar  clTcjcts,  such  as  fanning  in  phtorcfractivc  crystals,  generally  a  nui¬ 
sance,  are  helpful  for  the  implementation  of  the  random  interconnection  matrix  in  the 
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first  layer.  Spatial  light  modulators  (SLMs)  with  nonlinear  thresholding  and  amplification 
functions  can  be  used  to  simulate  neural  response.  In  our  experiment,  liquid  crystal  light 
valves  (LCLVs)  manufactured  by  Hughes  are  used  both  for  providing  the  input,  gain,  and 
for  use  as  thresholding  devices. 

The  schematic  diagram  of  our  two-layer  system  setup  is  shown  in  Fig.  2.  The  inter¬ 
connection  matrices  are  recorded  in  photorefractive  crystals  in  the  form  of  their  Fourier 
transforms  using  an  argon-ion  laser  (A  =  514  nm).  The  first  layer  consists  of  a  video 
monitor  (VM),  a  liquid  crj’^stal  light  valve  (LCLVl)  and  a  LiNbOs  photorefractive  crystal 
(PRl).  There  axe  100  input  units,  arranged  into  a  10  x  10  pixel  grid,  and  input  character 
patterns  are  drawn  on  this  grid.  Input  patterns  are  presented  on  VM,  imaged  onto  LCLVl 
by  an  imaging  lens  (Ll),  and  read  out  by  the  laser  beam  on  the  other  side  of  LCLVl.  Here. 
LCLVl  acts  as  an  incoherent  to  coherent  converter  and  also  an  image  amplifier.  For  the 
second  layer,  the  hidden  neuron  ai-ray  and  the  output  neuron  array  are  implemented  by 
a  second  liquid  crystal  light  valve  (LCLV2)  and  a  CCD  detector,  respectively.  The  inter¬ 
connection  weights  are  recorded  in  the  second  LiNbOs  crystal  (PR2).  In  this  experiment, 
the  hidden  layer  consists  of  an  array  of  approximately  300  x  300  neurons.  There  are  26 
output  neurons  for  this  system,  represented  by  26  pixels  in  the  CCD  detector  plane,  each 
responding  to  one  letter  in  the  alphabet  (A-Z).  The  training  of  this  network  is  done  first 
for  the  first  layer  followed  by  the  training  of  the  second  layer. 

During  the  training  of  the  first  layer,  random  dot  patterns  were  used  as  training 
patterns,  split  into  two  parts,  and  each  was  Fom'ier  transformed  by  lenses  L2  and  L3. 
These  two  Fourier  transformed  random  patterns  were  used  to  record  a  hologram  which 
consists  of  gratings  of  random  strength.  This  process  was  repeated  many  times  so  that  a 
volume  hologram  with  random  interconnection  weights  was  recorded.  Furthermore,  in  the 
crystal  we  used,  the  photorefractive  nonlinearity  is  sufficiently  strong  that  a  laser  beam 
passing  through  the  crystal  loses  much  of  its  power  to  a  broad  fan  of  light  resulting  from 
amplification  of  radiation  scattered  by  imperfections  in  the  crystal'^  and  from  asymmetric 
refractive  index  change  due  to  nommiformity  of  the  incident  beam'^.  This  jihenomcnon, 
called  beam  fanning,  furt}:'’r  ’-andomized  the  interconnections  wo  recorded  and  at  tlie  same 
time  drastically  increased  the  number  of  hidden  neurons  that  input  neurons  are  connected 
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to.  The  writing  beams  in  the  first  la3'er  were  polarized  in  the  extraordinary  direct  ion  with 
respect  to  the  crystal  in  order  to  obtain  maximum  fanning.  In  our  experiment,  each  of 
the  input  neurons  were  connected  to  about  10^  hidden  neurons.  Therefore  the  resulting 
weight  matrix  performs  a  dimension-expansion  random  mapping,  which  is  exactly  what  is 
needed  in  the  implementation  of  the  SDM  model.  After  the  first  layer  learning  is  finished, 
the  laiidom  interconnection  hologram  is  thermally  fixed^"*  and  training  of  the  second  laj^er 
is  then  started. 

The  goal  of  the  second  layer  training  is  that  one  of  the  26  output  neurons,  with 
spatial  position  proportional  to  the  order  of  that  letter  in  the  alphabet,  will  be  switched 
on  when  a  character  pattern  is  presented  at  the  input  of  the  network.  This  was  achieved 
by  training  the  second  layer  using  the  sum-of-outer-products  rule.  During  this  process, 
the  training  patterns  (in  our  case  the  character  patterns)  were  presented  at  the  network 
input  and  randomly  mapped  into  higher-dimensional  hidden  representations.  These  hidden 
representations  were  thresholded  and  amplified  by  LCLV2  and  Fourier  transformed  by  lens 
L5.  Their  Fourier  transform  holograms  were  recorded  in  association  with  reference  plane 
waves  with  appropriate  spatial  frequencies.  The  spatial  frequencies  of  these  reference 
beams  were  chosen  according  to  the  identity  of  the  input  patterns  such  that  the  response 
of  the  hidden  layer  was  added  to  the  weights  of  the  interconnections  leading  to  the  output 
neuron  that  is  responsible  for  that  input  pattern.  The  control  of  the  direction  of  the 
reference  beam  was  done  with  a  mirror  mounted  on  a  motorized  rotary  stage  controlled  by 
the  computer.  The  writing  beams  were  polarized  in  the  ordinary  direction  and  the  reading 
beain  was  polarized  in  the  extraordinary  direction,  in  order  to  give  maximum  diffraction 
efficiency  with  minimum  beam  coupling  during  writing. 

In  order  to  compensate  for  the  hologram  decay  in  photorefractive  crystals,  an  exposure 
schedule^®  was  followed  during  this  learning  process  so  that  weight  adaptation  was  done 
linearly,  i.e.,  holograms  were  formed  with  equal  strength  which  essentially  implemented 
the  sum  of  outer  products  in  Eq.  (1).  Let  .4^  be  the  amplitude  of  the  777th  hologram 
recorded.  After  a  total  of  M  exposures, 

-y  y 

.4„,  =  .4o[l  -  c.rp(— ^)]c.r7>(-  V  -^),  (2) 

7”  ^ '  *  y 

Tn'  =  Tn-f*  1 
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where  Aq  is  the  saturation  amplitude  of  a  hologram  recorded  in  the  photorefractive  crystal, 
tm  is  the  exposure  time  for  the  77rth  hologram,  r  is  the  chaicicteristic  time  constant  for 
recording  or  erasing  a  hologram  in  the  crystal.  According  to  Ref.  15,  if  we  want  to  obtain 
maximum  diffraction  efficiencies  of  recorded  holograms  with  Am  —  Am+i  for  all  m,  the 
exposure  schedule  that  should  be  followed  is  given  by 

TYl 

t,n  =  rln{ - -),  m  >  1,  (3) 

m  —  1 

with  ti  ^  T.  This  yields 

Am  =  Ao/M,  m  =  l,2,...,M.  (4) 

In  other  words,  the  diffraction  efficiency  (in  intensity)  of  each  hologram  is  inversely  pro¬ 
portional  to  M^.  For  recording  of  M  holograms,  the  total  exposure  time  is  given  by 

M 

t  =  tm  ~  ti  +  rlnM.  (5) 

n*  =1 

The  crystal  we  used  for  the  second  layer  was  an  8  mm- thick  LiNbOa,  doped  with 
0.01%  Fe.  Under  our  experimental  condition,  the  time  constant  r  was  measured  to  be 
425  seconds.  During  the  network  training,  internal  representations  of  104  handwritten 
character  patterns,  with  4  patterns  from  each  of  the  26  classes,  need  to  be  recorded  in  the 
second-layer  crystal  with  roughly  equal  diffraction  efficiencies.  The  exposure  time  for  each 
of  these  holograms  except  the  first  one  can  be  calculated  from  Eq.  (3).  For  example,  #2  = 
295  seconds  and  t^o  =  S.G  seconds.  U  was  chosen  to  be  25  minutes  so  that,  according  to 
Eq.  (2),  the  first  hologram  reached  saturation  or  maximum  efficiency.  Therefore,  with  ilf 
=  104,  the  total  exposure  time  is  f  =  5S  minutes. 

Another  important  issue  is  the  finite  angular  bandwidth  of  volume  holograms.  If  the 
angular  separation  between  the  reference  plane  waves  is  too  small,  the  presentation  of  any 
character  pattern  at  the  input  may  reconstruct  several  plane  waves  so  that  several  output 
neurons  (corresponding  to  these  reference  waves)  will  be  turned  on.  This  leads  to  crosstalk 
or  even  misclassification.  On  the  other  hand,  we  want  to  keep  the  angular  separation  as 
small  as  possible  to  facilitate  the  construction  of  the  optical  system.  To  find  an  appropriate 
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angular  separation,  we  need  to  examine  the  angular  bandwidth  of  volume  holograms  in 
the  crystal,  whi<'h  is  by^® 


Adr 


2ncdsin9c 


(G) 


where  A  is  the  laser  wavelength  in  vacuum,  6c  is  the  angle  of  incidence  of  the  writing  beams 
in  the  crystal  and  d  is  the  hologram  thickness.  In  our  experiment,  the  angle  of  incidence  of 
the  writing  beams  in  the  air  is  6q  =  20°  and  the  index  of  refraction  of  the  LiNbOs  crystal 
is  Tie  =  2.29.  Therefore  6c  can  be  solved  from 


ncsin6c  =  sindo,  (7) 

which  gives  6c  =  8.6°.  With  A  =  0.514pm,  d  =  8mm  and  using  Eq.  (6),  A^c  =  0.0054°. 
Finally,  we  can  find  the  angular  bandwidth  in  the  air  by  differentiating  Eq.  (7),  which 
yields 

A^o  =  AdcTicCosdcIcosdo  =  0.013°.  (8) 

To  make  sure  that  crosstalk  due  to  the  finite  angular  bandwidth  is  completely  suppressed, 
we  chose  angular  separation  between  reference  beams  to  be  0.03°.  Therefore  the  total 
angular  sweep  of  the  reference  beam  is  26  x  0.03°  =  0.78°,  which  is  reasonable  for  the 
motorized  rotary  stage  and  at  the  same  time  guarantees  overlapping  of  the  two  writing 
beams  in  the  crystal. 

A  photograph  of  the  experimental  system  is  shown  in  Fig.  3.  Figure  4  shows  the 
experimental  result,  which  includes  the  input  patterns,  their  internal  representations,  and 
the  responses  of  the  output  neurons.  The  input  patterns  shown  in  Fig.  4  were  among 
those  used  for  training  the  network.  The  different  positions  of  the  bright  dots  indicates 
which  output  neuron  has  the  strongest  response.  As  can  be  seen  from  Fig.  4,  crosstalk  was 
completely  suppressed  in  these  cases,  mainly  due  to  the  drastically  expanded  dimension  of 
the  hidden  representations  and  the  nonlinear  thresholding  operation  of  the  neurons.  We 
can  also  observe  the  differences  between  hidden  representations  for  different  input  patterns. 

To  check  the  generalization  property  of  this  network,  520  handwritten  character  pat¬ 
terns,  with  20  patterns  from  each  class,  were  presented  to  the  network  and  the  identity 
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of  each  pattern  was  determined  from  which  ovitput  neuron  liad  the  maximum  rt'sponse. 
Figure  5  shows  some  of  the  testing  patterns  and  the  result  is  summarized  in  Fig.  G.  whicli 
gives  the  niunbcr  of  cori'ect  classitications  out  of  20  tests  for  each  class.  311  out  of  the  520 
testing  patterns  were  correctlj*  classified,  yielding  a  recognition  rate  of  about  GO'a  . 

4.  Discussion 

Although  the  generalization  property  of  our  optical  network  is  not  quite  satisfactory 
due  to  the  fixed  first  layer  weights  and  the  limited  number  of  training  cycles  for  the 
second  layer,  its  performance  can  be  greatly  improved  if  we  train  both  layer  using  some' 
error  descent  algorithm.  Typically  these  iterative  error-driven  algorithms  require  at  least 
thousands  of  learning  cylces,  which  means  that  the  optical  system  will  have  to  handle  huge 
number  of  hologram  exposures.  Previously  recorded  photorefractive  holograms,  however, 
decay  as  new  holograms  are  being  recorded.  Simulations  have  shown  that  learning  is 
practically  impossible  with  decay  rate  given  by  Eq.  (4)  if  thousands  of  learning  cycles 
are  needed.  The  crucial  problem  we  will  have  to  solve  before  a  fully  trainable  multilayer 
network  can  be  built  is,  therefore,  to  control  the  hologram  decay  rate.  1  urthermore, 
existing  learning  algorithms  need  be  modified  to  match  the  current  hardware  technolgy 
and  simplify  optical  system  design.  The  success  of  these  efforts  should  result  in  a  fully 
trainable  multilayer  optical  neural  network  with  tremendous  computational  power  and 
learning  capability. 
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FIGURE  CAPTIONS 
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1.  Kancrva’s  Sparse,  Distributed  Memory  (SDM)  Model. 

2.  Optical  System  Layout  for  the  Two-Layer  Neural  Network.  VM  =  Video  Monitor, 
LCLV  =  Liquid  Crystal  Light  Valve,  PR  =  Photorefractive  Crystal,  (P)BS  =  (Polar¬ 
izing)  Beam  Splitter,  RM  =  Rotating  Mirror,  L  =  Lens,  S  =  Shutter. 

3.  E.x'perimental  Setup  of  the  Sj  stem  in  Fig.  2. 

4.  Examples  of  the  Signals  at  the  Input  (top),  Hidden  (middle),  and  Output  (bottom) 
Layers  in  the  Experimental  System. 

5.  Examples  of  the  Test  Patterns. 

C.  Histogram  of  the  Test  Results. 
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